Two-Level Metadata Management for Data Deduplication System

نویسندگان

  • Jin San Kong
  • Min Ja Kim
  • Wan Yeon Lee
  • Woong Ko
چکیده

Data deduplication is an essential solution to reduce storage space requirement. Especially chunking based data deduplication is very effective for backup workloads which tend to be files that evolve slowly, mainly through small changes and additions. In this paper, we introduce a novel data deduplication scheme which can be efficiently used with low bandwidth network in a rapid time. The key points are using tree map searching and classifying data as global and metadata. These are the main aspects to influencing fast performance of the data deduplication.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design and Implementation of a Library Metadata Management Framework and its Application in Fuzzy Data Deduplication and Data Reconciliation with Authority Data

We describe the application of a generic workflow management system to the problem of metadata processing in the library domain. The requirements for such a framework and acting real-world forces are examined. The design of the framework is layed out and illustrated by means of two example workflows: fuzzy data deduplication and data reconciliation with authority data. Fuzzy data deduplication ...

متن کامل

Metadata Considered Harmful...to Deduplication

Deduplication is widely used to improve space efficiency in storage systems. While much attention has been paid to making the process of deduplication fast and scalable, the effectiveness of deduplication can vary dramatically depending on the data stored. We show that many file formats suffer from a fundamental design property that is incompatible with deduplication: they intersperse metadata ...

متن کامل

Design and Implementation of an Open-Source Deduplication Platform for Research A RESEARCH PROFICIENCY EXAM PRESENTED BY

Data deduplication is a technique used to improve storage utilization by eliminating duplicate data. Duplicate data blocks are not stored and instead a reference to the original data block is updated. Unique data chunks are identified using techniques such as hashing, and an index of all the existing chunks is maintained. When new data blocks are written, they are hashed and compared to the has...

متن کامل

A Robust Fault-Tolerant and Scalable Cluster-wide Deduplication for Shared-Nothing Storage Systems

Deduplication has been largely employed in distributed storage systems to improve space efficiency. Traditional deduplication research ignores the design specifications of shared-nothing distributed storage systems such as no central metadata bottleneck, scalability, and storage rebalancing. Further, deduplication introduces transactional changes, which are prone to errors in the event of a sys...

متن کامل

Deduplication Strategy for Efficient Use of Cloud Storage

-With the enormous creation of data in the day to day life, storing it costs a lot of space, be it on a personal computer, a private cloud, a public cloud or any reusable media. The storage and transfer cost of data can be reduced by storing a unique copy of duplicate data. This gives birth to data deduplication, is one of the important data compression techniques and has been widely used in cl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013